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Abstract 

We describe the plans and objectives of the CEDAR 
project (Combined e-Science Data Analysis Resource for 
High Energy Physics) newly funded by the PPARC e- 
Science programme in the UK. CEDAR will combine the 
strengths of the well established and widely used HEP- 
DATA database of HEP data and the innovative JetWeb 
data/Monte Carlo comparison facility, built on the HZ- 
TOOL package, and will exploit developing grid technol- 
ogy. The current status and future plans of both of these 
individual sub-projects within the CEDAR framework are 
described, showing how they will cohesively provide (a) an 
extensive archive of Reaction Data, (b) validation and tun- 
ing of Monte Carlo programs against these reaction data 
sets, and (c) a validated code repository for a wide range of 
HEP code such as parton distribution functions and other 
calculation codes used by particle physicists. Once estab- 
lished it is envisaged CEDAR will become an important 
Grid tool used by LHC experimentalists in their analyses 
and may well serve as a model in other branches of sci- 
ence where there is a need to compare data and complex 
simulations. 

THE PHYSICS PROBLEM 

Particle physics experiments at high-energy accelerators 
provide a wealth of data on the final state in electron- 
positron, lepton-proton and proton-(anti)proton interac- 
tions. These data represent a triumph for the Standard 
Model, particularly in precision electroweak measurements 
and the verification of the QCD sector to an impressive de- 
gree of precision. 

Despite these successes, several aspects of high-energy 
collisions are still poorly understood, often due to technical 
difficulties in the calculation of non-perturbative or com- 
plex perturbative effects. Such lack of understanding can 
be a limiting factor in the accuracy of new measurements. 
Examples with particular relevance to the LHC include par- 
ton distribution functions (PDFs) in hadrons, hadronisa- 
tion in the final state, multijet production and "underlying 
events". Fig.[2illustrates some of these processes in a com- 
plex high energy event. All these processes are calculated 
and/or modelled in Monte Carlo simulation and calcula- 
tion programs that employ fits to existing data. Consistent 
tuning of the free parameters of these models, and confir- 
mation of the physics assumptions they contain, is a non- 
trivial matter since the measurements are made with a vari- 



ety of colliding beams, in many different regions of phase 
space, and for many complex observables. 
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Figure 1: The complexity of a collision process. 



THE SOLUTION - CEDAR 

A solution to the above problem will be provided by 
CEDAR, a project newly funded by the PPARC e-Science 
programme in the UK, which will construct a resource for 
particle physics enabling the predictions of Monte Carlos, 
and other calculation programs, to be easily compared with 
real data. CEDAR will allow the parameters of the mod- 
els to be varied and to be simultaneously compared to as 
wide a range of data distributions as is necessary to main- 
tain global consistency. These global comparisons are vi- 
tally important as it is quite possible to obtain a good fit to 
a new set of data but at the same time lose the quality of fit 
to other existing data. 

CEDAR will combine the strengths of the established 
and widely used HEPDATA database 1 1 1 of high-energy 
physics data and the innovative JetWeb data comparison 
facility |2|, and will exploit developing Grid technology. 

In short:- 

CEDAR = JetWeb + HEPDATA + more... 

The individual parts, including the "...more" are described 
below. 

JetWeb 

JetWeb is a WWW interface, developed at UCL, which 
provides a facility for direct comparison between the pre- 
dictions of Monte Carlo programs and measured physics 
distributions from experiments. It is based upon the HZ- 
TOOL program 1 3 1 which generates the physics distribu- 
tions from a given Monte Carlo program by calculating the 



best fits, all data 



JetWeb 



Automated Data Comparisons for High Energy Physics 
1 8/02/04: This server is currently read-only. See w for details. 



summaries, all fits Search the DataBase 



Maintenance 



i HERWIG latest 

> PYTHIA latest 

> HERWIG all 
i PYTHIA all 



documentation, 
downloads 

♦ Latest News 
. Bibliography 

♦ Generator Parameters 

♦ Developer Resources 



simulations 



Selected Results 



♦ Studies for a Future Linear Collider 

♦ Minimum PT of har d scatters 
+ ntrinsic KT photon/proton 

. PYTHIA parton showers PARPS7 

♦ Parton Distribution Functions in Photon 

♦ HERWIG Soft Underlying Event 

♦ HERWIG Photan Radius 

. HERWIG fragmentation parameters [CLMAX.PSPLT) 



If you do use any results from here, please quote Comp. Phys. Comm. vol 153/2 164-173 (2003) 

The current focus of this project is on jet and heavy flavour production in hadron-like collisions (which includes 
hadron-photon and photon-photon). There is no reason why other data shouldn't be incorporated though. 

If you'd like join in, or have any comments or suggestions please contact us atjetweb@hep.ucl.ac.uk 



experiments 

. HERA (H1, ZEUS ) 
. LEP ( OPAL ) 
. Tevatron (CDF, DO) 
. HEPDATA 



The story so far: 

641 □ jobs submitted to Manchester PBS, 5782 completed 

zeoojobs submitted to UCL PBS, 2598 completed 

171 jobs submitted to UCL NQS, 142 completed 

68 jobs submitted to GridPP, 35 completed 

641 jobs submitted to Sheffield PBS, 536 completed 



Figure 2: The JetWeb home web page 



relevant variables and applying the cuts used in the pub- 
lished measurements. 

The JetWeb interface allows the user to request and dis- 
play quantitative comparisons of chosen Monte Carlo mod- 
els with any number of measured data distributions. In this 
way the compatibility with the older data sets can be main- 
tained when tuning to new data sets. 

Since the generation of Monte Carlo events to calcu- 
late the predicted distributions to the required accuracy is 
a very compute intensive operation, JetWeb maintains a 
database (MySQL) of Monte Carlo generated data which 
can queried, or added to at user request, thus increasing the 
available statistics. JetWeb is already "grid-enabled" and 
has submitted its Monte Carlo jobs to GridPP with suc- 
cessful outcomes, as well as directly using the computer 
systems at UCL, Manchester and Sheffield. 

Fig- shows the JetWeb WWW home page which pro- 
vides the user with the starting point to begin a new fit, or 
to search the JetWeb database for results from previous fits. 

Fig-E]show the main components of the JetWeb server. 
Interaction with the user is via two Java Servlets, running in 
a Tomcat servlet container linked to an Apache web server: 
the Searcher servlet provides the the standard functions 
available to a general user; the Maintainer provides addi- 
tional functions, such as adding new data, to the JetWeb 
administrators. 

At the heart of JetWeb is the Java Object Model (JOM) 
which contains the properties and interactions of the 
Model, Papers, Plots and Fits. (In this context a Model 
means a unique generator version and set of parameters.) 
Data is converted between formats via this model, e.g. from 
Database to HTML display. The JOM interacts with the 
JetWeb database, a MySQL database which stores exper- 
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Figure 3: The various components of JetWeb 

imental data as well as predicitions from various models. 
The predictions are generated by running the Monte Carlo 
generators within the HZTOOL package, and the results 
are available to the user who requested the run and to any 
future user who makes a similar request. 

At present, the size of the JetWeb database is restricted 
by the relatively small proportion of experimental data dis- 
tributions which have HZTOOL routines to generate the 
predicted distributions from Monte Carlos generators. 

CEDAR will do the following for JetWeb:- 

• Re-design HZTOOL in OO structured C++ for long 
term maintenance and development. This is vitally 
important as new HEP code, and in particular new 
Monte Carlos, are being written using C++ in such 



HEPDATA: REACTION DATA Database 

...containing ni^erical values of HEP scattering data such as total and differential cross sectionSj fragmentation functionSj structure functions, and 
polarisation measurements, from a wide range of experiments. It is compiled by the Durham Database Group (UK) with help from the COMPAS gri 
(Russia^ and is updated at regular intervals. 

Reaction Database HELP ■ Full User Guide - Uat? register In tell U3 who you are, or feedback to send Us any commmenta, suggestions or complaints. ,* 



Standard (Keyword) Search Method 

Enter search command: 



then: 



Submit Search 



J or Clear | or ^ k Sor HELP 



Search sysntax: keyword {op} value {boolean keyword {op} value} {... 
where "o " is =,>,<,>=, <=, (the default is =). 
and "boolean" is AND, OR, and NOT. 

{} indicates optional elements. 

Example searches : 



ffyyV" #.c ? he tnti of vnUiC pi m-. v.' -' ■■ 
Nate(2): searches are case insensitive, 

.: = ■ ■ .v.- ,' . ■ :■. : ■ y 

Keywords to use (select for specific help): 

[ALJTH] [REF] [YEAR] [REAC] [FSP] [BEAM] [J_ARG] 

r OBSl r PLABl [DE] [EXP] 

p-Aiti«:l e : tralas y .'o? ' ■ . ■' < ■ mme f- ovs teio* 

AB CDEFGHIJKLMNOPQRSTUVWXVZ 

(Output Teal Size = | NORMAL ^J, 



Other Methods of Search the Database 

M* Form Interface 

A fill-inform for making simple queries of the database. 

3 Easy search method 

Step- by-step through a search using a series of menus from initial states 
to final states and observables. 

Data Reviews 

Compilations of selected sub-sets of data organized in an easy to 
locate format. 

^ Structure Functions in PIS ED 
Single Photon Productionin Hadronic Interactions E l 
Two-Photon Reactions le adi ng to Hadron Final States UUI 

- Drell-Van cross sections El 

Inclusive particle production data in e+e- Interactions IMI 
..' Hadronic Total Cross Seciton (R) in e+e- Interactions km 
. Neutrino Cross Sections funder development! 

These DataReviews are published in the loP's, Journal of Physics G - 
Nuclear and Particle Physics, Ele ctronic versions of the reviews can be 
obtained through the relevant 1331 links above- 
Pre-defined Event Shape/Jet Searches: 

- Event Shapes (Thrust, Oblateness etc..) 
j Event Shapes in E+ E- collisions 

Event Shapes in non-E+ E- collisions 
J et production (in any process) 

- Jet production in E+E- collisions 

j Jet production in non-E+E- collisions 



to the HEPDATA home page 



Figure 4: The HEPDATA Reaction Data page 



ways. 

• Incorporate new Monte Carlos. At present only HER- 
WIG and PYTHIA are available. 

• Use HEPDATA as the source of "real" data distribu- 
tions, giving access to more data including (eventu- 
ally) LHC data. 

• Develop the Web and Grid interface to the model val- 
idator data. 

It is stressed however that although direct access to the data 
from HEPDATA will increase the scope of experimental 
data available to JetWeb, it will always be necessary for 
any data to be used to have a corresponding HZTOOL (or 
some equivalent) routine. The CEDAR project will actively 
encourage the participation of the experimental community 
in producing these routines. 

HEPDATA 

HEPDATA is a PPARC funded project which has been 
in existence now for over 25 years. Its principal aim has 
remained essentially the same over this period, namely to 
compile scattering data from all types of HEP reactions 
(cross sections, event shapes, polarisations, etc..) and 
to make the resulting compilations easily available to the 
whole community. 

More recently other services such as the hosting of mir- 
rors of the SLAC SPIRES databases and the Berkeley Par- 
ticle Data Group (PDG) Review of Particle Physics (RPP) 
web pages in the UK, have been added to the HEPDATA 
operation. HEPDATA also provides a unique and compre- 
hensive PDF code server with an on-line PDF calculation, 



display and comparison facility These are all accessible 
from the main HEPDATA home web page, shown in Fig.0] 

The scope of the HEPDATA database covers cross sec- 
tions from all types of particle physics reactions. It is em- 
phasised that it does not contain "particle properties" which 
fall into the domain of the RPP of the Berkeley PDG. It 
also not contain raw data such as found on DSTs of ex- 
periments. To appear in the database the data are gener- 
ally in the final published form. Ideally, to be most useful, 
they should be fully corrected for acceptances and efficien- 
cies and be model independent. The database contains data 
from around 10000 publications dating from early experi- 
ments in the 1970s to the present day data from the LEP, 
Tevatron and HERA collaborations. It is regularly added 
to and updated. The data are obtained from journals and 
preprints and direct from the experiments especially when 
data appear only in graphical form in a publication. In the 
latter case the authors of the paper are contacted to obtain 
the exact numerical values shown in the plot. It is very 
important that this is done at the time of the publication 
as experience has shown how difficult it is to obtain nu- 
merical values at a later date. Data are rarely read from 
plots due to the difficulty in getting accurate representa- 
tions Finally, verification of data entered into the database 
is always sought from the experimenters themselves. 

At present the HEPDATA project uses a non propri- 
etary database management system (BDMS - the Berke- 
ley Database Management System), as it has since its in- 
ception. This is a hierarchical DBMS in which data are 
stored in a tree like structure with the paper as the main 
record unit and the all data tables in a particular publica- 



tion stored within that one unit (a data record). While this 
DBMS has proved very resilient and stable over the years, 
it is clearly not suitable for the present purpose of directly 
interfacing with JetWeb, or with any other resource over 
the Internet/Grid. 

CEDAR will do the following for HEPDATA:- 

• Migrate the data from the HEPDATA BDMS hierar- 
chical database to a MySQL relational database. Not 
only will this provide the necessary means to inter- 
face with JetWeb, and other resources if necessary, but 
it will also address the long term future needs of HEP- 
DATA by using a DBMS which is more standard and 
maintainable than its present one. 

• Integrate the new database into JetWeb as the source 
of its "real" data. This will expand the number of 
available data sets to which JetWeb has access. 

• Make the new database available on the Internet/Grid 
as a networked database for general use. This will 
involve implementing and expanding on the existing 
functionality of the HEPDATA web search and display 
methods. Expansions envisaged include direct access 
to the data through the conventional networks and also 
via the "Grid". 

• Develop new methods of direct entry and validation 
of data by the experiments, thus making them in con- 
trol of their own data. At present the entry and veri- 
fication of data in the database is generally instigated 
and controlled by the HEPDATA personnel. In future 
CEDAR will seek methods and formats (e.g. XML) in 
which experiments can enter and maintain their own 
data in the database potentially using Grid technolo- 
gies and access validation methods. It should also be 
noted that the development of an XML document for- 
mat for transfer of particle physics data will be useful 
also for data output to the Grid etc. as well as input to 
HEPDATA. 



more... HEP CODE 

As well as the improvements to JetWeb and HEPDATA 
and their integration as discussed in the previous sections, it 
is also the intention of CEDAR to provide access to current 
and validated versions of various HEP theory and experi- 
ment software used at present, and in the future LHC era. 
A partial list of such software includes JETRAD, DYRAD, 
EXCALIBUR, (DI)PHOX, ZFITTER, RACOONWW and 
MADGRAPH. No central repository of such codes cur- 
rently exists in an organised and consistent way at present 
and as such it will be of great benefit to the community. 

A prototype version, HEPCODE, has been set up on 
the IPPP website (http://www.ippp.dur.ac.uk/hepcode/! at 
Durham which includes, as well as a table of available pro- 
grams and links, a web form for submission of new codes 
to the repository. 



SUMMARY 

In summary we list the various tasks the CEDAR project 
will be engaged upon:- 

• Convert HEPDATA to a relational MySQL database. 

• Modify JetWeb to access data direct from HEPDATA. 

• Design and implement an OO replacement for HZ- 
TOOL. 

• Extend the number of HZTOOL routines/models 
available in JetWeb. 

• Include new generation C++ Monte Carlos. 

• Develop CEDAR grid tools for automatic validation 
and data access. 

• Develop standard document formats (e.g. XML) for 
experimental physics data and results, which can be 
used for example as methods for reaction data ex- 
port/import from experiments, as well as being useful 
in a more general context. 

• Develop a code repository for a wide range of HEP 
codes and integrate with the validation centre. 

• Incorporate LHC and, eventually, ILC data. 

Once established and running with all the latest codes 
and data sets, it is envisaged that CEDAR will become an 
important tool used by LHC experimentalists in their analy- 
ses. Other branches of science which have need to compare 
data and complex model situations may also find the tech- 
niques and tools used in CEDAR of benefit to their work. 

THE CEDAR TEAM 

At UCL:- 

Jon Butterworth - J.Butterworth@ucl.ac.uk 

Susanna Butterworth - sb@hep.ucl.ac.uk 

Ben Waugh - waugh@hep.ucl.ac.uk 
At Durham: - 

James Stirling - W.J.Stirling@durham.ac.uk 

Mike Whalley - M.R.Whalley@durham.ac.uk 

A N Other - to be appointed from 1st April 2005 
Web Links: - 

CEDAR -|http://cedar.ac.uk/| 

JetWeb - http://jetweb.hep.ucl.ac.uk/ 

HEPDATA - http://durpdg.dur.ac.uk/hepdata/ 

HZTOOL - http://hztool.hep.ucl.ac.uk/ 
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